Automated Word Puzzle Generation via Topic Dictionaries 
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1. Introduction 

Puzzles play a central role in our everyday lives with 
exciting potentials. As assessments in education and 
psychometry, puzzles are among the most frequently 
used tools (Verguts & Boeck, 2000) . A well-known ex- 
ample is the odd one out puzzle of IQ tests (Carter, 
2005). There exist dedicated word puzzles to test 
or improve a wide array of skills including language 
skills, verbal aptitude, logical thinking or general in- 
telligence, such as the multiple-choice synonym task 
of the TOEFL test. Puzzle creation is also a vibrant 
subfield of procedural content generation for games 
(PCG), the automated generation of game content 
for which there is a continuously increasing demand 
thanks to the thriving popularity of computer and 
video games. 

However, generating and maintaining such puzzles 
manually is quite challenging and expensive: auto- 
mated schemes could be of considerable benefit. A 
central problem one has to cope with is variety; other- 
wise the solver will encounter the same (kind of) puzzle 
multiple times. In case of word puzzles, new puzzles 
are needed continuously because (i) novel words are 
created on a daily basis (e.g., on blogs), (ii) existing 
words get new meaning (e.g., 'chat'), (iii) words go out 
of common use (e.g., 'videotape'). Due to the different 
nature of puzzles, the problem of generating puzzles 
automatically has been tackled only in quite special 
cases. There exist, for example efficient techniques for 
(i) sudoku games, (ii) creating mazes on chessboards, 
or (iii) generating puzzles and quests (objectives for 
the players) for massively multiplayer online games. 

To the best of our knowledge, automated word puzzle 
generation is a novel area of this field. Colton (2002) 
addressed the problem by a complex theory formation 



system to obtain odd one out, analogy and next in 
the sequence puzzles. The presented approach however 
relied on highly structured datasets, which required 
serious human annotation effort. 

Our goal is to develop a general automated word 
puzzle generation method from 

1 . an unstructured and unannotated document collec- 
tion, i.e., a simple corpus, 

2. a topic model 1 , which induces a topic dictionary 
from the input corpus, and 

3. a semantic similarity measure of word pairs. 

Our method, relying only on these three general com- 
ponents, is capable of (i) generating automatically a 
large number of valuable word puzzles of many differ- 
ent types, including the odd one out, choose the related 
word and separate the topics puzzle: 

• In odd one out puzzles, the solver is required to se- 
lect the word that is dissimilar to the other words. 

• In choose the related word puzzles, the solver has to 
select the word that is closely related to a previously 
specified group of words. 

• In separate the topics puzzles, the solver has to sepa- 
rate the set of words into two disjoint sets of related 
words. 

(ii) The method can create easily domain- specific puz- 
zles by replacing the corpus component, (iii) It is also 
capable of automatically generating puzzles with pa- 
rameterizable levels suitable for, e.g., beginners or in- 
termediate learners. In the following, we present the 
basic ideas behind our approach (Section 2) with some 
numerical illustrations (Section 3). For more extensive 
demonstrations and further details, see (Pinter et al., 
2012). 



ICML-2012 - Sparsity, Dictionaries and Projections in Ma- 
chine Learning and Signal Processing Workshop. Copyright 
2012 by the author(s)/owner(s). 



1 Examples include e.g., latent semantic analysis, group- 
structured dictionaries or latent Dirichlet allocation. 
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2. Method 

Below, we define the key components of our presented 
approach for automated word puzzle generation. The 
word puzzles we focus on are produced by (i) gener- 
ating consistent sets of related concepts and then (ii) 
mixing these sets with weakly related elements: words 
or other consistent sets. For example, in the odd one 
out puzzle, it is sufficient to add a single unrelated 
word to a consistent set. 

For the generation of consistent sets (see Algorithm 1), 
we assume that we are given (i) an unlabeled corpus 
X = [xi,...,x M ] G R NxM and (ii) a topic model T. 
In corpus X, the documents (x* G 1*) are represented 
as weights assigned to words. For example, in a bag of 
words representation iy is the number of occurences of 
the ith word in the jth document. Our assumption for 
the topic model T, is that it induces a dictionary D = 
T(X) = [di, . . . , d K ] G R NxK whose K elements, i.e., 
topics di (i = 1, . . . , K) describe well the documents 
in the corpus. 

Numerous topic models (T) fit to this family. 
For example, in latent semantic analysis (LSA; 
(Deerwester et al., 1990)) — which is perhaps the most 
widely known topic model — the singular value decom- 
position of X = USV T is computed and X is approx- 
imated by keeping only the first K (columns of U) 
left singular vectors; these vectors form D. Group- 
structured dictionaries approximate X by adding a 
structure-inducing regularization (f2) on the elements 
of D, i.e, minimize the cost function (p > 0) 
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For an excellent review on structured sparsity, see 
(Bach et al., 2012). In latent Dirichlet allocation 



(LDA; (Blei et al, 2003)) topics d t E 



(i 



1, . . . , K) are modelled as latent random variables with 
a Dirichlet prior. The dictionary D consists of the es- 
timated diS. 

For word puzzles, we keep only the k (< N) 
most significant words of the topics as sets: m,j = 
argmax fc (dj) C {1,...,N} (\rrii\ = k). Topic models 
can produce junk topics (Alsumait et al., 2009). For 
example, common function words, such as did, said, 
etc. can form a topic. These topics result in incon- 
sistent sets, whose words are not closely related. To 
evaluate the consistency of sets and discard inconsis- 
tent ones — which is highly desirable in word puzzles — , 
we define the consistency of the resulting word sets rrii 



using the semantic relatedness of the word pairs they 
contain measured by explicit semantic analysis (ESA; 
(Gabrilovich & Markovitch, 2009)). In ESA, given a 
concept repository, such as the articles of Wikipedia, 
the relatedness of two words (w, w') is measured as 
the similarity of their concept based representations: 



COs(ipESA(w), ifESA (w')). 



(3) 



The basic assumption of ESA is that if a word appears 
frequently in a Wikipedia article, then that article rep- 
resents the meaning of the word well. 

Since in word puzzles, even a single word too weakly 
connected to the others can make the resulting puzzles 
ambiguous, it is prudent to rate each set (m,) accord- 
ing to the word that is the least related to the other 
words in the set. The semantic relatedness measure 
(Eq. (3)) is also not perfectly accurate: false positives 
(two words seem related when in reality they are not) 
or false negatives (the similarity measure gives a small 
value even though the two words are related) may ap- 
pear. 

To cope with these challenges in determining set 
(to) consistency, one can proceed as follows (see 
Algorithm 1, line 7). Robustness to false nega- 
tives can be increased by defining the relatedness of 
two words based on all paths (max pai ) 1 ( i j )) between 
them in the semantic similarity (s uv , Eq. (3)) graph 
G = (to, S| m ). Robustness to false positives can be 
achieved by taking the minimum relatedness on the 
path (min egpat /j(j .j) s e ). Finally, to ensure that the 
quality of a set is determined by the two most dis- 
similar words in the set, one can compute the minima 
over all i ^ j word pairs (min^- sim(i, j)). A set is de- 
fined to be consistent if this quality of the set is above a 
given threshold S. It can be shown (Jungnickel, 2007) 
that the sim(i,j) similarity values are equal to the 
weight of the unique path between i and j in the max- 
imum spanning tree (T) of G. Moreover, since the s uv 
values are non-negative, it is sufficient to find the edge 
with minimal weight in T to determine the quality of 
the set. For an illustration, see Fig. 1. 

Having the consistent sets (C) at hand, word puzzles 
can be easily generated by mixing unrelated elements 
with 6. The pseudocode of odd one out puzzle gen- 
eration is given in Algorithm 2. The puzzle gener- 
ator has two parameters, rjx and f]i . Parameter r\i 
determines whether the consistent set and the unre- 
lated element are dissimilar enough so that they can 
be mixed to form a word puzzle. Parameter r?i allows 
the creation of puzzles of different difficulty (beginner, 
intermediate, etc.): by increasing 771, the relatedness 
of the additional elements to the consistent set is in- 
creased, therefore, the puzzle is made harder. Similar 
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Algorithm 1 Identify Consistent Sets (C) 

1: Input: corpus X = [xi, . . . , xm] G M. NxM , topic model T, size of consistent sets k, semantic similarity of 

words S = [sij] G M. NxN , consistency treshold S 

2: C 4— / / there is no consistent set at the beginning 

3: D = T(X) = [di, . . . , d K ] G R NxK // compute the topic dictionary 

4: M = {mi, . . . , mjf }, mi = argmax fe (di) C {1, . . . , N}, |mj| = k / / k most significant words of the topics 

5: for all in € M do 

6: G = (to, S| ) // semantic-weighted graph of the candidate consistent set to 

7: if min sim(i,j) := max min s e > 5 then // similarity of the 2 most dissimilar words 

(!,j)£mxm,i/j path(i.j) e£Lpath(i,j) 

8: C <- CU {to} // set to is declared to be consistent 



health 




(c) 

Figure 1. Checking the consistency of 3 sets of words. 
Here, sets contain k = 5 words. Bold edges: maximal 
spanning tree. Dashed line: edge with minimal weight in 
the tree; determines consistency, (a): a highly consistent 
set; all the words are strongly connected to the word vote. 
(b): a consistent set; some of the relatedness values (e.g., 
between care and treatment) are lower than one would ex- 
pect; The method is robust: a relatively high consistency 
value is assigned to the set. (c): an inconsistent set; the 
word class is weakly connected to the others. 

constructions can be applied to generate choose the re- 
lated word or separate the topics puzzles (Pinter et ah, 
2012). 

3. Illustration 

Here, we illustrate the efficiency of our method in au- 
tomated odd one out puzzle generation. 

Consistent sets are a cornerstone of the presented 
method. In the first experiment we compared the 
number of consistent sets of a given quality (> 5) 
(i) the different topic models, LSA, LDA and OSDL 
(Szabo et ah, 2011) a recent group-structured dictio- 



Algorithm 2 Odd One Out Puzzle Generation 
Input: consistent sets C, minimal (maximal) relat- 
edness to consistent sets r\\ (r\2) 
for all C G C do 
repeat 

select random word w 

a 4- max fe( 7 s tw II max. relatedness of w to C 
until r/i < a < rj2 
output (C, w) puzzle 




0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 



Threshold (8) Threshold (8) 

(a) (b) 

Figure 2. Number of consistent sets produced by the dif- 
ferent topic models as a function of treshold S. (a): corpus 
of Wikipedia, (b): NIPS proceedings. 



nary learning technique could produce, (ii) on two 
corpora (X). The two corpora were the English 
Wikipedia with M = 10, 000 samples and the domain- 
specific corpus of NIPS proceedings (M = 1,740). 
Consistent sets were composed of k = 4 words. The 
number of topics was chosen to be K = 400. Our 
results are summarized in Fig. 2. According to the 
figure, out of the studied topic models, LDA performs 
the best, with OSDL following closely behind. LSA 
does not seem applicable to word puzzle generation: it 
produces very few consistent sets. The methods per- 
form better in terms of the number of consistent sets 
on Wikipedia than on the corpus of NIPS proceedings. 
This is expected, since M, the number of articles in the 
Wikipedia highly exceeded that of NIPS proceedings. 
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Consistent set of words Odd one out 



Table 1. Odd one out - beginner puzzles. 



Consistent set of words Odd one out 



cao 


wei 


liu 


emperor 


king 


superman 


clark 


luthor 


kryptonite 


batman 


devil 


demon 


hell 


soul 


body 


egypt 


egyptian 


alexandria 


pharaoh 


bishop 


singh 


guru 


sikh 


saini 


delhi 


language 


dialect 


linguistic 


spoken 


sound 


mass 


force 


motion 


velocity 


orbit 


voice 


speech 


hearing 


sound 


view 


athens 


athenian 


pericles 


corinth 


ancient 


function 


problems 


polynomial 


equation 


physical 



Table 2. Odd one out - intermediate puzzles. 



In the second experiment, we demonstrate the ob- 
tained odd one out puzzles generated from Wikipedia 
(X). We chose 5 = 0.1 to obtain a significant num- 
ber of good enough consistent sets (see Fig. 2). First, 
we illustrate the beginner puzzles (Table 1), where we 
chose parameters 771 = 0.005, and 7/2 = 0.02. In other 
words, the puzzles generated consist of a consistent 
set of related words and an unrelated word. Begin- 
ner puzzles can be solved at first glance by a person 
who understands the language and has a wide vocabu- 
lary, for example, the puzzles vote, election, candidate, 
voters, sony, or Olympic, tournament, world, champi- 
onship, acid. These could be useful for e.g., beginner 
language learners or, with a suitable corpus, for chil- 
dren. Some puzzles require specific knowledge about a 
topic. To solve the puzzles harry, potter, wizard, ron, 
manchester and superman, clark, luthor, kryptonite, 
division, the solver must be familiar with the book, 
film, comic, etc. To solve austria, german, austrian, 
Vienna, Scotland, geographic knowledge is needed. 

Second, we illustrate intermediate puzzles (Table 2) 
obtained with 771 =0.1 and 772 = 0.2. Although the 
presented method is based on semantic similarity, it is 
able to create surprisingly subtle puzzles. In the puz- 
zle voice, speech, hearing, sound, view, the word view 
has a different modality than the others. To solve the 
puzzle cao, wei, liu, emperor, king, the solver should 
be familiar with the three kingdoms period of the Chi- 



nese history. For egypt, egyptian, alexandria, pharaoh, 
bishop, knowledge of the Egyptian history, for athens, 
athenian, pericles, corinth, ancient, familiarity with 
the Peloponnesian War is required. In singh, guru, 
sikh, saini, delhi, all the words except delhi are re- 
lated to sikhism. The puzzle function, problems, poly- 
nomial, equation, physical can be solved only with a 
basic knowledge of mathematics and physics. These 
results demonstrate the efficiency of our automated 
word puzzle generation approach. 
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